Analyzing Retweet Dynamics of German MPs in the 19th Legislative Period

Assignment 2: Data Analysis Exercise

Author

Lukas Warode

library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)

Load and Manipulate Dataset

In my Master’s Thesis, I scale German MPs (MdBs) of the 19th legislative period across ideological dimensions that are determined in an unsupervised way by using parliamentary speeches and Tweets. While retweet dynamics do not play a big role in the scaling algorithm per se, party affiliation does and is particularly interesting for network analytical approaches, especially in the realm of clustering.

tweet_df_final <- readRDS("data/tweet_df_final_2.rds")

# further meta information 
od_tw_info <- read_csv("data/od_tw_merge_final.csv")

How many Tweets are there in total? Over 1.3 Million - quite a lot.

tweet_df_final %>% 
  nrow()
[1] 1301427

How many Twitter users are there in total across all MPs of the 19th legislative period?

tweet_df_final %>% 
  select(author_id, tw_id) %>% 
  distinct() %>% 
  nrow()
[1] 515

Create data.frame that exhibits information of referenced Tweets

retweet_df <- tweet_df_final %>% 
  select(referenced_tweets, tweet_type_character, tweet_type_final, id) %>% 
  mutate(
    # type of referenced tweet
    ref_tweet_types = map(1:length(referenced_tweets),
                          ~ referenced_tweets[[.x]][["type"]]),
    # id of referenced tweet (NOT id of user)
    ref_tweet_id = map(1:length(referenced_tweets),
                       ~ referenced_tweets[[.x]][["id"]])
  )  

Get author information of Tweets that were referenced

author_id_ref_tweet <- retweet_df %>% 
  filter(ref_tweet_types == "retweeted") %>% 
  mutate(ref_tweet_id = unlist(ref_tweet_id)) %>% 
  left_join(
    tweet_df_final %>% select(id, author_id, tw_id), 
    by = c("ref_tweet_id" = "id")
  ) %>% 
  filter(!is.na(author_id), !is.na(tw_id)) %>% 
  select(id, ref_tweet_id, author_id, author_id, tw_id)

Create final data.frame that only consists of retweets that occured within the dataset

Filter for retweets and merge author information from retweeted Tweets.

# vector with IDs from Tweets that are from authors within the dataset
tweet_ids <- tweet_df_final$id

retweet_within_df <- retweet_df %>% 
  filter(ref_tweet_types == "retweeted") %>% 
  filter(ref_tweet_id %in% tweet_ids) %>% 
  left_join(tweet_df_final %>% select(id, author_id, tw_id)) %>% 
  # directedness: from (orginal tweet) -> to (retweeted tweet) 
  left_join(author_id_ref_tweet, by = "id", suffix = c("_to", "_from"))

How many Tweets are left after constructing the “within-retweet data.frame”? Just about 100000 or 7.7% of all Tweets.

retweet_within_df %>% nrow()
[1] 100311
paste0((nrow(retweet_within_df) / nrow(tweet_df_final) * 100) %>% round(1), "%")
[1] "7.7%"

Validate Retweet-Network Structure

retweet_within_df %>% select(tw_id_from, tw_id_to) %>% head(5)
      tw_id_from       tw_id_to
1   katjakipping dorisachelwilm
2    victorperli dorisachelwilm
3  niemamovassat dorisachelwilm
4 michel_brandt_ dorisachelwilm
5 dietmarbartsch dorisachelwilm

This makes actually sense! Doris Achelwilm (MP from Left Party) retweets Katja Kipping and Dietmar Bartsch, which do both play a central role in the Left Party: They were party leaders in the 19th legislative period.

Plot most retweeted MPs

First, let us create a vector with party colors.

party_colors <- c("#138BD8", "#000000", "#FFEE0A", "#529222", "#AE1862", "#E30019")
names(party_colors) <- c("AfD", "CDU/CSU", "FDP", "Grüne", "DIE LINKE.", "SPD")

We also create a ggplot2 theme, which is based on theme_bw(), but has some refinements.

theme_custom <- function() {
  font <- "Corbel"
  # base theme
  theme_bw() %+replace%
    theme(
      text = element_text(family = font),
      legend.text = element_text(size = 9),
      legend.title = element_blank(),
      legend.box.background = element_rect(
        colour = "black",
        fill = "white",
        linetype = "solid"
      ),
      # grid lines
      panel.grid.major = element_line(color = "grey60", size = 0.2),
      panel.grid.minor = element_line(color = "grey80", size = 0.1),
      # faceting
      strip.background = element_blank(),
      strip.text = element_text(color = "black"),
    )
}

Now, we set the new theme as default.

theme_set(theme_custom())

We also create a data.frame that just includes the party affiliation per MP.

party_user <- tweet_df_final %>% 
  distinct(tw_id, party)

Plot Top 20

Now, we count how often per MP a Tweet was retweeted by another MP. We plot the to 20 MPs in total and visualize the party affiliation.

retweet_within_df %>% 
  group_by(tw_id_from) %>%
  count() %>%
  ungroup() %>% 
  arrange(desc(n)) %>% 
  top_n(20) %>% 
  left_join(party_user, by = c("tw_id_from" = "tw_id")) %>%
  ggplot(aes(x = n, y = reorder(tw_id_from, n), fill = party)) + 
  geom_col() +
  scale_fill_manual(values = party_colors) +
  labs(y = "", title = "Most Prominent MPs by Retweets of other MPs") +
  theme_custom()

Interestingly, the AfD and FDP dominate the chart, while there is no single SPD MP within the top 20 most retweeted MPs within. Let’s check the pattern and plot by party! Overall, the results seem to make sense, given that party leaders and influential MPs are the most retweeted MPs according to the bar plot.

Plot Top 20 by Party

Now, let’s turn the attention to the most retweeted MPs by party. Both CDU/CSU and SPD, the biggest parties and also only members of the governing coalition of the 19th legislative period do not have as many influential MPs, measured by number of retweets across their colleagues, as opposition parties have.

retweet_within_df %>% 
  left_join(party_user, by = c("tw_id_from" = "tw_id")) %>%
  filter(!party %in% c("Fraktionslos", NA)) %>% 
  group_by(tw_id_from, party) %>%
  count() %>%
  group_by(party) %>% 
  arrange(desc(n)) %>%
  top_n(10) %>% 
  ggplot(aes(x = n, y = reorder(tw_id_from, n), fill = party)) + 
  geom_col() +
  facet_wrap(~ party, scales = "free_y", ncol = 2) +
  scale_fill_manual(values = party_colors) +
  labs(y = "", title = "Most Prominent MPs by Retweets of other MPs", subtitle = "Grouped by Party") +
  theme_custom()

Construct Retweet-Network

In a Retweet-Network each retweet is represented as a directed link in a network that connects from the original Tweet to the referenced retweet. Users that are often retweeted within the network are expected to be associated with some degree of (digital) political relevance, even though there a lot of potential confounding variables (e.g. number of Tweets, number of Followers, general social media behaviour and association with socio-demographic factors such as cohort affiliation).

retweet_network <- retweet_within_df %>% 
  select(from = tw_id_from, to = tw_id_to) %>% 
  as_tbl_graph(directed = TRUE) %>% 
  left_join(party_user, by = c("name" = "tw_id")) 
 
# get corresponding row ID (from ID) 
row_id_from <- retweet_network %>% 
  as_tibble() %>%
  mutate(id_from = row_number())

# update retweet network with edge information: name and party of retweeted users (from)
retweet_network <- retweet_network %>% 
  activate(edges) %>% 
  left_join(row_id_from, by = c("from" = "id_from")) %>% 
  rename(party_from = party, name_from = name) %>% 
  activate(nodes)

retweet_network
# A tbl_graph: 487 nodes and 100311 edges
#
# A directed multigraph with 1 component
#
# Node Data: 487 × 2 (active)
  name           party     
  <chr>          <chr>     
1 katjakipping   DIE LINKE.
2 victorperli    DIE LINKE.
3 niemamovassat  DIE LINKE.
4 michel_brandt_ DIE LINKE.
5 dietmarbartsch DIE LINKE.
6 susanneferschl DIE LINKE.
# … with 481 more rows
#
# Edge Data: 100,311 × 4
   from    to name_from     party_from
  <int> <int> <chr>         <chr>     
1     1    44 katjakipping  DIE LINKE.
2     2    44 victorperli   DIE LINKE.
3     3    44 niemamovassat DIE LINKE.
# … with 100,308 more rows

Degree Centrality

Out-degree

In the previous plots, we already calculated the out-degree centrality (which is the same as the number of outgoing edges per node or the number of original Tweets that have been retweeted within). Nevertheless, double-checking is always a good practice!

# %N>% activate nodes (tidygraph syntax)
retweet_network %N>%
  mutate(
    # in_degree = centrality_degree(mode = "in"),
    out_degree = centrality_degree(mode = "out")
  ) %>% 
  arrange(desc(out_degree)) %>% 
  select(name, out_degree, party) %>% 
  as_tibble() %>% 
  head(20) %>% 
  knitr::kable()
name out_degree party
c_lindner 4066 FDP
udohemmelgarn 3983 AfD
alice_weidel 3853 AfD
frank_pasemann 3794 AfD
brihasselmann 2475 Grüne
marcobuschmann 2294 FDP
schneider_afd 1830 AfD
beatrix_vstorch 1554 AfD
gtzfrmming 1330 AfD
dietmarbartsch 1208 DIE LINKE.
m_reichardt_afd 1164 AfD
konstantinkuhle 1162 FDP
olliluksic 1154 FDP
paulziemiak 1022 CDU/CSU
joanacotar 1021 AfD
konstantinnotz 956 Grüne
renner_afd 933 AfD
nicole_hoechst 921 AfD
jensspahn 901 CDU/CSU
johannesvogel 888 FDP

Now, let’s plot the distribution of the out-degree. There are a few outliers at the upper end.

retweet_network %N>%
  mutate(
    # in_degree = centrality_degree(mode = "in"),
    out_degree = centrality_degree(mode = "out")
  ) %>% 
  as_tibble() %>% 
  ggplot(aes(x = out_degree)) + 
  geom_histogram() +
  labs(x = "Out-Degree") +
  theme_custom()

The out-degree distribution is quite comparable across parties: Most MPs were never retweeted, but have important retweet hubs. However, for the Green and Left Party, the distribution seems to be a bit less skewed.

retweet_network %N>%
  filter(!party %in% c("Fraktionslos", NA)) %>% 
  mutate(
    # in_degree = centrality_degree(mode = "in"),
    out_degree = centrality_degree(mode = "out")
  ) %>% 
  as_tibble() %>% 
  ggplot(aes(x = out_degree, fill = party)) + 
  geom_histogram() +
  labs(x = "Out-Degree") +
  facet_wrap(~ party, scales = "free") +
  scale_fill_manual(values = party_colors) +
  theme_custom()

In-degree

The in-degree corresponds to the “inverse retweet behaviour”: It indicates which MPs retweeted Tweets of other MPs.

# %N>% activate nodes (tidygraph syntax)
retweet_network %N>%
  mutate(
    in_degree = centrality_degree(mode = "in"),
    # out_degree = centrality_degree(mode = "out")
  ) %>% 
  arrange(desc(in_degree)) %>% 
  select(name, in_degree, party) %>% 
  as_tibble() %>% 
  head(20) %>% 
  knitr::kable()
name in_degree party
udohemmelgarn 6002 AfD
schneider_afd 5001 AfD
frank_pasemann 3583 AfD
martinrosemann 3409 SPD
renner_afd 3270 AfD
olliluksic 2140 FDP
nicole_hoechst 1735 AfD
drandreasnick 1567 CDU/CSU
stbrandner 1519 AfD
rkiesewetter 1505 CDU/CSU
reinholdmdb 1494 FDP
c_jung77 1284 FDP
k_sa 1225 Grüne
elsnervongronow 1187 AfD
tschipanski 1006 CDU/CSU
ullinissen 991 SPD
andi_wagner 969 DIE LINKE.
gtzfrmming 868 AfD
djanecek 834 Grüne
steinekecdu 807 CDU/CSU
retweet_network %N>%
  mutate(
    in_degree = centrality_degree(mode = "in"),
    # out_degree = centrality_degree(mode = "out")
  ) %>% 
  as_tibble() %>% 
  ggplot(aes(x = in_degree)) + 
  geom_histogram() +
  labs(x = "In-Degree") +
  theme_custom()

retweet_network %N>%
  filter(!party %in% c("Fraktionslos", NA)) %>% 
  mutate(
    in_degree = centrality_degree(mode = "in"),
    # out_degree = centrality_degree(mode = "out")
  ) %>% 
  as_tibble() %>% 
  ggplot(aes(x = in_degree, fill = party)) + 
  geom_histogram() +
  labs(x = "In-Degree") +
  facet_wrap(~ party, scales = "free") +
  scale_fill_manual(values = party_colors) +
  theme_custom()

The in-degree distribution seems to be a bit more normally distributed, meaning that “inverse hubs”, or MPs that heavily retweeted other MPs, are also present but not as dominant as retweeted MP hubs. However, the AfD seems to dominate the retweet dynamic from this perspective, as the table above shows.

Comparing Out- and In-degree

In- and out-degree seem to be correlated (\(R = 0.53\)), however, there are outliers that influence the relationship heavily.

# %N>% activate nodes (tidygraph syntax)
retweet_network %N>%
  mutate(
    in_degree = centrality_degree(mode = "in"),
    out_degree = centrality_degree(mode = "out")
  ) %>% 
  select(name, in_degree, out_degree) %>% 
  as_tibble() %>% 
  left_join(party_user, by = c("name" = "tw_id")) %>%
  ggplot(aes(out_degree, in_degree)) + 
  geom_point(alpha = 0.5) +
  ggpubr::stat_cor() +
  theme_custom()

It seems that there are a few outliers at the upper end of the distribution. Let’s see whether the correlation holds when they are removed.

The correlation is still quite high (\(R = 0.38\)) and highly significant, when we remove the MPs with the top 20 highest in- and out-degrees.

top_20_in <- retweet_network %>% 
  mutate(in_degree = centrality_degree(mode = "in")) %>% 
  arrange(desc(in_degree)) %>% 
  as_tibble() %>% 
  top_n(20) %>% 
  pull(name)

top_20_out <- retweet_network %>% 
  mutate(out_degree = centrality_degree(mode = "out")) %>% 
  arrange(desc(out_degree)) %>% 
  as_tibble() %>% 
  top_n(20) %>% 
  pull(name)

# %N>% activate nodes (tidygraph syntax)
retweet_network %N>%
  mutate(
    in_degree = centrality_degree(mode = "in"),
    out_degree = centrality_degree(mode = "out")
  ) %>% 
  select(name, in_degree, out_degree) %>% 
  as_tibble() %>% 
  left_join(party_user, by = c("name" = "tw_id")) %>%
  filter(!name %in% top_20_in, !name %in% top_20_out) %>% 
  ggplot(aes(out_degree, in_degree)) + 
  geom_point(alpha = 0.5) +
  ggpubr::stat_cor() +
  theme_custom()

retweet_network %N>%
  filter(!party %in% c("Fraktionslos", NA)) %>% 
  mutate(
    in_degree = centrality_degree(mode = "in"),
    out_degree = centrality_degree(mode = "out")
  ) %>% 
  select(name, in_degree, out_degree) %>% 
  as_tibble() %>% 
  left_join(party_user, by = c("name" = "tw_id")) %>%
  filter(!name %in% top_20_in, !name %in% top_20_out) %>% 
  ggplot(aes(out_degree, in_degree)) + 
  geom_point(alpha = 0.5, aes(color = party)) +
  ggpubr::stat_cor(show.legend = F) +
  facet_wrap(~ party) +
  scale_color_manual(values = party_colors) +
  theme_custom()

Distance / Shortest Path Length

Most of the users are connected across (a path length of) 2 or 3 retweets.

distances_retweet_network <- retweet_network %>% 
  distances() %>%
  table() %>% 
  as_tibble() %>% 
  rename(Distance = 1) 

distances_retweet_network %>% 
  knitr::kable()
Distance n
0 487
1 18554
2 105456
3 101042
4 11290
5 332
6 8
distances_retweet_network %>%
  ggplot(aes(x = Distance, y = n)) + geom_col() +
  ggtitle("Distribution of Distance (Shortest Path)") +
  theme_custom()

Network Visualization

Create a new custom theme, based on ggraph::theme_graph().

theme_graph_custom <- function() {
  font <- "Corbel"
  # base theme
  theme_graph() %+replace%
    theme(
      text = element_text(family = font),
      legend.text = element_text(size = 9),
      legend.title = element_blank(),
      legend.box.background = element_rect(
        colour = "black",
        fill = "white",
        linetype = "solid"
      ),
      # faceting
      strip.background = element_blank(),
      strip.text = element_text(color = "black"),
    )
}

Standard Plot

Using ggraph auto/default options to plot all nodes and edges, we can see that there is a strong pattern of “intra-party retweeting”. Interestingly, while most parties members exhibits almost exclusively retweet within the party, there is a strong visual intersection of SPD and Left Party retweets. To a lesser extent, this also repeats for party of the Greens with the SPD

plot_all_auto <- retweet_network %>% 
  ggraph(layout = "auto") + 
  geom_edge_link(alpha = 0.015) +
  geom_node_point(aes(color = party))  +
  scale_color_manual(values = party_colors) + 
  theme_graph_custom() 

plot_all_auto

Circular

Now let’s visualize the network in a circular way, highlighting the retweet dynamics in a way that underlines the party structure of the dataset.

plot_all_circular <- retweet_network %>% 
  filter(!party %in% c("Fraktionslos", NA)) %>% 
  ggraph(layout = "linear", circular = TRUE) + 
  geom_edge_arc(aes(color = party_from), alpha = 0.015) +
  scale_edge_color_manual(values = party_colors) + 
  coord_fixed() +
  theme_graph_custom() +
  theme(legend.position = "none")

plot_all_circular

Further Centrality Measures

Closeness Centrality

The Closeness Centrality is defined as the inverse total distance of paths between a node and all other nodes in the network. Shorter paths to others are thus an indicator of node centrality.

closeness_retweet_network <- retweet_network %>% 
  mutate(Closeness = centrality_closeness())

The Closeness Centrality is not dominated by the AfD, which was exhibiting the MPs with the highest degrees. This finding underlines that centrality measures that go beyond the degree centrality play an important role when assessing the actual dynamics of the dataset.

closeness_retweet_network %>% 
  filter(Closeness != 1) %>% 
  arrange(desc(Closeness)) %>% 
  select(name, Closeness, party) %>% 
  as_tibble() %>% 
  head(20) %>% 
  knitr::kable()
name Closeness party
konstantinkuhle 0.0011416 FDP
karl_lauterbach 0.0010776 SPD
brihasselmann 0.0010672 Grüne
marcobuschmann 0.0010616 FDP
heikomaas 0.0010604 SPD
larsklingbeil 0.0010504 SPD
johannesvogel 0.0010194 FDP
rkiesewetter 0.0010194 CDU/CSU
renatekuenast 0.0010183 Grüne
thomasoppermann 0.0010183 SPD
mgrossebroemer 0.0010163 CDU/CSU
c_lindner 0.0010152 FDP
konstantinnotz 0.0010152 Grüne
schneidercar 0.0010121 SPD
katarinabarley 0.0010121 SPD
nouripour 0.0010101 Grüne
mastrackzi 0.0010091 FDP
sigmargabriel 0.0010091 SPD
lambsdorff 0.0010070 FDP
cem_oezdemir 0.0010040 Grüne

The closeness centrality seems to be actually normally distributed, while there are some outliers on the lower bound.

closeness_retweet_network %>% 
  filter(Closeness != 1) %>% 
  as_tibble() %>% 
  ggplot(aes(x = Closeness)) + 
  geom_density() +
  theme_custom()

While most parties have an approximately equally distributed Closeness Centrality, the AfD has the lowest one with a visually perceivable difference. For interpretational reasons, this is likelly due to internal heterogeneity, given the populist character and short existence period of the party.

closeness_retweet_network %>% 
  filter(!party %in% c("Fraktionslos", NA)) %>%
  filter(Closeness != 1) %>% 
  as_tibble() %>% 
  ggplot(aes(x = Closeness, y = reorder(party, Closeness), fill = party)) + 
  ggridges::stat_density_ridges(alpha = 0.5, quantile_lines = T, quantiles = 4) +
  scale_fill_manual(values = party_colors) +
  theme_custom() +
  labs(y = "")

Betweenness Centrality

Next to Closeness, there is Betweenness Centrality. Betweenness is defined as the number of shortest paths that go through a node (divided by all shortest paths). It is thus also a measure of Centrality, but with a different statistical focus.

betweenness_retweet_network <- retweet_network %>% 
  mutate(Betweenness = centrality_betweenness())
betweenness_retweet_network %>% 
  arrange(desc(Betweenness)) %>% 
  select(name, Betweenness, party) %>% 
  as_tibble() %>% 
  head(20) %>% 
  knitr::kable()
name Betweenness party
tschipanski 26989.810 CDU/CSU
s_muenzenmaier 19464.496 AfD
stbrandner 18131.114 AfD
olliluksic 14463.042 FDP
rkiesewetter 12089.569 CDU/CSU
_martinneumann 8452.709 FDP
fabiodemasi 8202.163 DIE LINKE.
joanacotar 8010.589 AfD
hubertus_heil 7998.812 SPD
c_lindner 7090.924 FDP
gtzfrmming 6647.937 AfD
timon_gremmels 6566.856 SPD
wanderwitz 6504.270 CDU/CSU
matthiashauer 6361.569 CDU/CSU
udohemmelgarn 6303.254 AfD
mgrossebroemer 6059.420 CDU/CSU
sven_kindler 5791.922 Grüne
larsklingbeil 5217.861 SPD
drandreasnick 4975.422 CDU/CSU
konstantinnotz 4778.380 Grüne

The Betweenness distribution is heavily skewed. There are only 5 MPs that have a Betweenness > 10000.

betweenness_retweet_network %>% 
  as_tibble() %>% 
  ggplot(aes(x = Betweenness)) + 
  geom_histogram() +
  theme_custom()

For matters of visualization, it thus makes sense to plot the log of the Betweenness.

betweenness_retweet_network %>% 
  as_tibble() %>% 
  ggplot(aes(x = Betweenness)) + 
  geom_histogram() +
  scale_x_log10() +
  theme_custom()

After seeing the log distribution, it is easier to compare the distribution across the parties. While most parties have an unimodal distribution, the AfD has a flatter distribution with a lower average betweenness and no real peak. This confirms the finding of the applied Closeness Centrality.

betweenness_retweet_network %>% 
  filter(!party %in% c("Fraktionslos", NA)) %>%
  filter(Betweenness != 1) %>% 
  as_tibble() %>% 
  ggplot(aes(x = Betweenness, color = party)) + 
  facet_wrap(~ party) +
  geom_density() +
  scale_x_log10() +
  # geom_density() +<
  scale_color_manual(values = party_colors) +
  theme_custom() +
  labs(y = "")

Eigenvector Centrality

Eigenvector Centrality takes into consideration the whole network when assessing centrality. The assigned centrality depends of a node depends on the centrality of neighbours

eigen_retweet_network <- retweet_network %>% 
  mutate(eigen = centrality_eigen())

Interestingly, the AfD dominates the Eigenvector Centrality. Given the retweet dynamic that most AfD MPs seem to retweet a lot within the party, but not just to certain central hubs, this is an intersting finding.

eigen_retweet_network %>%
  filter(eigen != 1) %>% 
  arrange(desc(eigen)) %>% 
  select(name, eigen, party) %>% 
  as_tibble() %>% 
  head(20) %>% 
  knitr::kable()
name eigen party
udohemmelgarn 0.7522489 AfD
schneider_afd 0.4050560 AfD
renner_afd 0.3067681 AfD
alice_weidel 0.2783748 AfD
nicole_hoechst 0.2208869 AfD
m_reichardt_afd 0.1844001 AfD
elsnervongronow 0.1345158 AfD
stbrandner 0.1104824 AfD
beatrix_vstorch 0.0965541 AfD
gtzfrmming 0.0935327 AfD
tino_chrupalla 0.0607585 AfD
stefankeuterafd 0.0543826 AfD
r_hartwig_afd 0.0441641 AfD
joanacotar 0.0410962 AfD
s_muenzenmaier 0.0379447 AfD
buettner_sdl 0.0290164 AfD
DrMEspendiller 0.0274493 AfD
petrbystronafd 0.0267201 AfD
gottfriedcurio 0.0208919 AfD
martin_hess_mdb 0.0198649 AfD

The dynamic is also repeated visually: Most parties have an approximately normally distributed Eigenvector Centrality, while the AfD has a peak on the right side (“left skewed”).

eigen_retweet_network %>% 
  filter(!party %in% c("Fraktionslos", NA)) %>%
  filter(eigen != 1) %>% 
  as_tibble() %>% 
  ggplot(aes(x = eigen, color = party)) + 
  facet_wrap(~ party) +
  geom_density() +
  scale_x_log10() +
  # geom_density() +<
  scale_color_manual(values = party_colors) +
  theme_custom() +
  labs(y = "")

Community Detection

For substantive reasons, community detection is an interest approach given the already existing party clusters, that are visible when plotting the whole network. In addition, since we are dealing with a directed network, not every type of community detection algorithm is appropriate.

Components

Let’s start with the basic components detection.

group_comp_retweet_network <- retweet_network %>% 
  mutate(
    group_comp_weak = group_components("weak"),
    group_comp_strong = group_components("strong")
  )

Applying the “weak” component algorithm yields a redundant community detection: Every node is part of the same community

group_comp_retweet_network %>% distinct(group_comp_weak)
# A tbl_graph: 1 nodes and 5 edges
#
# A directed multigraph with 1 component
#
# Node Data: 1 × 1 (active)
  group_comp_weak
            <int>
1               1
#
# Edge Data: 5 × 4
   from    to name_from    party_from
  <int> <int> <chr>        <chr>     
1     1     1 katjakipping DIE LINKE.
2     1     1 katjakipping DIE LINKE.
3     1     1 katjakipping DIE LINKE.
# … with 2 more rows

However, when applying the “strong” definition, there are 42 distinct communities detected. “Strong” refers to detecting cliques, meaning that every of those 42 subgroups are fully connected within.

group_comp_retweet_network %>% distinct(group_comp_strong) 
# A tbl_graph: 42 nodes and 21 edges
#
# A directed multigraph with 40 components
#
# Node Data: 42 × 1 (active)
  group_comp_strong
              <int>
1                 1
2                15
3                13
4                12
5                11
6                40
# … with 36 more rows
#
# Edge Data: 21 × 4
   from    to name_from      party_from
  <int> <int> <chr>          <chr>     
1     3    19 edrossmann     SPD       
2     6     6 siegbertdroese AfD       
3     8     8 frankeedgar    SPD       
# … with 18 more rows

Let’s now have a look on how the found components correspond to the party structure.

group_comp_retweet_network %>% 
  filter(!party %in% c("Fraktionslos", NA)) %>%
  as_tibble() %>% 
  group_by(party, group_comp_strong) %>% 
  count() %>% 
  arrange((desc(n))) %>% 
  knitr::kable()
party group_comp_strong n
CDU/CSU 1 101
SPD 1 86
FDP 1 80
AfD 1 63
Grüne 1 59
DIE LINKE. 1 53
AfD 8 1
AfD 9 1
AfD 11 1
AfD 38 1
AfD 39 1
AfD 40 1
AfD 41 1
AfD 42 1
CDU/CSU 2 1
CDU/CSU 3 1
CDU/CSU 5 1
CDU/CSU 6 1
CDU/CSU 10 1
CDU/CSU 12 1
CDU/CSU 15 1
CDU/CSU 18 1
CDU/CSU 19 1
CDU/CSU 20 1
CDU/CSU 22 1
CDU/CSU 23 1
CDU/CSU 24 1
CDU/CSU 25 1
CDU/CSU 26 1
CDU/CSU 34 1
CDU/CSU 35 1
CDU/CSU 36 1
DIE LINKE. 16 1
FDP 17 1
Grüne 37 1
SPD 4 1
SPD 7 1
SPD 13 1
SPD 14 1
SPD 21 1
SPD 27 1
SPD 28 1
SPD 29 1
SPD 30 1
SPD 31 1
SPD 32 1
SPD 33 1

Turns out, that most parties are part of group 1, by far. The algorithm is not really informative as well.

group_comp_retweet_network %>% 
  filter(!party %in% c("Fraktionslos", NA)) %>%
  as_tibble() %>% 
  ggplot(aes(x = group_comp_strong, fill = party)) + 
  facet_wrap(~ party) +
  geom_bar() +
  scale_fill_manual(values = party_colors) +
  theme_custom() 

Infomap

As the components did not yield sufficient discrimiant communities, let’s try out other algorithms. Infomap follows the random walker approach.

set.seed(42)
group_infomap_retweet_network <- retweet_network %>% 
  mutate(
    group_infomap = group_infomap()
  )

It seems to work, at least partially! Especially the AfD communities are detected efficiently and thus separated from the other parties. However, a validated finding would need a sufficient amount of trials and tuning to get rid of the random component.

The parties are differently present across the groups, which seems satisfies the discrimant goal.

group_infomap_retweet_network %>% 
  filter(!party %in% c("Fraktionslos", NA)) %>%
  as_tibble() %>% 
  group_by(party, group_infomap) %>% 
  count() %>% 
  arrange(party, desc(n)) %>% 
  group_by(party) %>% 
  slice(1:3) %>% 
  knitr::kable(caption = "Top 3 Infomap Groups by Party")
Top 3 Infomap Groups by Party
party group_infomap n
AfD 4 22
AfD 6 21
AfD 3 10
CDU/CSU 2 16
CDU/CSU 12 14
CDU/CSU 10 12
DIE LINKE. 8 13
DIE LINKE. 23 9
DIE LINKE. 28 7
FDP 7 11
FDP 1 8
FDP 5 8
Grüne 24 9
Grüne 1 7
Grüne 33 6
SPD 9 9
SPD 20 9
SPD 22 9
group_infomap_retweet_network %>% 
  filter(!party %in% c("Fraktionslos", NA)) %>%
  as_tibble() %>% 
  ggplot(aes(x = group_infomap, fill = party)) + 
  facet_wrap(~ party) +
  geom_bar() +
  scale_fill_manual(values = party_colors) +
  theme_custom() 

Further Work

For the last assignment, it would be interesting to dive in a bit more into the correspondence of certain community detection and clustering algorithms and check their correspondence with the party structure. In addition, given that we are dealing with social media data dynamics, there is a lot of social media context data available. Every Tweet has a certain number of likes and comments attached that could serve as weights. Furthermore, every profile exhibits a number of followers that indicate the social media relevance. It would be interesting to check whether and which type of centrality is statistically associated with the number of followers.